AITopics | t-sne and umap

Collaborating Authors

t-sne and umap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stop Misusing t-SNE and UMAP for Visual Analytics

Jeon, Hyeon, Park, Jeongin, Shin, Sungbok, Seo, Jinwook

arXiv.org Artificial IntelligenceOct-2-2025

Misuses of t-SNE and UMAP in visual analytics have become increasingly common. For example, although t-SNE and UMAP projections often do not faithfully reflect the original distances between clusters, practitioners frequently use them to investigate inter-cluster relationships. We investigate why this misuse occurs, and discuss methods to prevent it. To that end, we first review 136 papers to verify the prevalence of the misuse. We then interview researchers who have used dimensionality reduction (DR) to understand why such misuse occurs. Finally, we interview DR experts to examine why previous efforts failed to address the misuse. We find that the misuse of t-SNE and UMAP stems primarily from limited DR literacy among practitioners, and that existing attempts to address this issue have been ineffective. Based on these insights, we discuss potential paths forward, including the controversial but pragmatic option of automating the selection of optimal DR projections to prevent misleading analyses.

machine learning, natural language, t-sne and umap, (20 more...)

arXiv.org Artificial Intelligence

2506.08725

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Overview (1.00)
Questionnaire & Opinion Survey (0.93)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
(2 more...)

Add feedback

Federated t-SNE and UMAP for Distributed Data Visualization

Qiao, Dong, Ma, Xinxian, Fan, Jicong

arXiv.org Artificial IntelligenceDec-17-2024

High-dimensional data visualization is crucial in the big data era and these techniques such as t-SNE and UMAP have been widely used in science and engineering. Big data, however, is often distributed across multiple data centers and subject to security and privacy concerns, which leads to difficulties for the standard algorithms of t-SNE and UMAP. To tackle the challenge, this work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization under the framework of federated learning, without exchanging data across clients or sending data to the central server. The main idea of Fed-tSNE and Fed-UMAP is implicitly learning the distribution information of data in a manner of federated learning and then estimating the global distance matrix for t-SNE and UMAP. To further enhance the protection of data privacy, we propose Fed-tSNE+ and Fed-UMAP+. We also extend our idea to federated spectral clustering, yielding algorithms of clustering distributed data. In addition to these new algorithms, we offer theoretical guarantees of optimization convergence, distance and similarity estimation, and differential privacy. Experiments on multiple datasets demonstrate that, compared to the original algorithms, the accuracy drops of our federated algorithms are tiny.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.13495

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

NeuroDAVIS: A neural network model for data visualization

Maitra, Chayan, Seal, Dibyendu B., De, Rajat K.

arXiv.org Artificial IntelligenceApr-1-2023

The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global structures within the data. To this end, however, very few such methodology exist, which can realise this task. In this work, we have introduced a novel unsupervised deep neural network model, called NeuroDAVIS, for data visualization. NeuroDAVIS is capable of extracting important features from the data, without assuming any data distribution, and visualize effectively in lower dimension. It has been shown theoritically that neighbourhood relationship of the data in high dimension remains preserved in lower dimension. The performance of NeuroDAVIS has been evaluated on a wide variety of synthetic and real high-dimensional datasets including numeric, textual, image and biological data. NeuroDAVIS has been highly competitive against both t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) with respect to visualization quality, and preservation of data size, shape, and both local and global structure. It has outperformed Fast interpolation-based t-SNE (Fit-SNE), a variant of t-SNE, for most of the high-dimensional datasets as well. For the biological datasets, besides t-SNE, UMAP and Fit-SNE, NeuroDAVIS has also performed well compared to other state-of-the-art algorithms, like Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) and the siamese neural network-based method, called IVIS. Downstream classification and clustering analyses have also revealed favourable results for NeuroDAVIS-generated embeddings.

artificial intelligence, machine learning, neurodavis, (18 more...)

arXiv.org Artificial Intelligence

2304.01222

Country:

South America (0.04)
North America (0.04)
Africa (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Understanding t-Distributed Stochastic Neighbor Embedding part1 (Artificial Intelligence)

#artificialintelligenceJul-4-2022, 14:35:18 GMT

Abstract: We consider the mobile localization problem in future millimeter-wave wireless networks with distributed Base Stations (BSs) based on multi-antenna channel state information (CSI). For this problem, we propose a Semi-supervised tdistributed Stochastic Neighbor Embedding (St-SNE) algorithm to directly embed the high-dimensional CSI samples into the 2D geographical map. We evaluate the performance of St-SNE in a simulated urban outdoor millimeter-wave radio access network. Our results show that St-SNE achieves a mean localization error of 6.8 m with only 5% of labeled CSI samples in a 200*200 m² area with a ray-tracing channel model. Abstract: Neighbor embedding methods t-SNE and UMAP are the de facto standard for visualizing high-dimensional datasets.

artificial intelligence, t-distributed stochastic neighbor embedding part1, visualization, (10 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.56)

Industry: Telecommunications (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Communications > Networks (0.56)

Add feedback

Visualizing Data using GTSNE

Shi, Songting

arXiv.org Machine LearningAug-3-2021

High-dimensional data visualization is a very important problem for human to sense the data. Currently, the state of art methods are t-SNE (Laurens et al. (2008), Laurens van der Maaten (2013)) and UMAP (Mcinnes and Healy (2018)), which has similar principle for the nonlinear low dimension reduction. They use neighborhood probability distribution to connect the high-dimensional data points to low-dimensional map points, which try to make the local relative neighborhood relation unchanged but ignoring the change in the macro structure of the data. However, this may make the low dimension map points representing the high-dimensional structure unfaithfully. In the low-dimensional neighborhood keeping and patching process, t-SNE sometimes will make the neighborhood relations in the highdimensional structure break in the the low-dimensional space. We add a macro loss term on the loss of t-SNE to make it keep the relative k-means centroids structure in the low and high dimensional space, which basically keep the macro structure unchanged in the low dimensional space.

gtsne, macro structure, mc ro, (17 more...)

arXiv.org Machine Learning

2108.01301

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Extracting the main trend in a dataset: the Sequencer algorithm

Baron, Dalya, Ménard, Brice

arXiv.org Machine LearningJun-24-2020

Scientists aim to extract simplicity from observations of the complex world. An important component of this process is the exploration of data in search of trends. In practice, however, this tends to be more of an art than a science. Among all trends existing in the natural world, one-dimensional trends, often called sequences, are of particular interest as they provide insights into simple phenomena. However, some are challenging to detect as they may be expressed in complex manners. We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset. It does so by constructing graphs describing the similarities between pairs of observations, computed with a set of metrics and scales. Using the fact that continuous trends lead to more elongated graphs, the algorithm can identify which aspects of the data are relevant in establishing a global sequence. Such an approach can be used beyond the proposed algorithm and can optimize the parameters of any dimensionality reduction technique. We demonstrate the power of the Sequencer using real-world data from astronomy, geology as well as images from the natural world. We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques. This approach to exploratory data analysis, which does not rely on training nor tuning of any parameter, has the potential to enable discoveries in a wide range of scientific domains. The source code is available on github and we provide an online interface at \url{http://sequencer.org}.

artificial intelligence, dataset, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

2006.13948

Country:

Asia > Middle East > Israel (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Learning Multidimensional Projections

Espadoto, Mateus, Hirata, Nina S. T., Telea, Alexandru C.

arXiv.org Machine LearningFeb-21-2019

Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ability to visually separate distinct data clusters. However, such methods are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct such projections. We train a deep neural network based on a collection of samples from a given data universe, and their corresponding projections, and next use the network to infer projections of data from the same, or similar, universes. Our approach generates projections with similar characteristics as the learned ones, is computationally two to three orders of magnitude faster than SNE-class methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high dimensional datasets from machine learning.

dataset, projection, projection technique, (14 more...)

arXiv.org Machine Learning

1902.07958

Country:

Europe > Portugal > Coimbra > Coimbra (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback